Tracking provenance in a virtual data grid

نویسندگان

  • Ben Clifford
  • Ian T. Foster
  • Jens-S. Vöckler
  • Michael Wilde
  • Yong Zhao
چکیده

The virtual data model allows data sets to be described prior to, and separately from, their physical materialization. We have implemented this model in a Virtual Data Language (VDL) and associated supporting tools, which provide for both the storage, query, and retrieval of virtual data set descriptions, and the automated, on-demand materialization of virtual data sets. We use a standardized data provenance challenge exercise to illustrate the powerful queries that can be performed on the data maintained by these tools, which for a single virtual data set can include three elements: the computational procedure(s) that must be executed to materialize the data set, the runtime log(s) produced by the execution of the computation(s), and optional metadata annotation(s) that associate application semantics with data and procedures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Target Tracking Based on Virtual Grid in Wireless Sensor Networks

One of the most important and typical application of wireless sensor networks (WSNs) is target tracking. Although target tracking, can provide benefits for large-scale WSNs and organize them into clusters but tracking a moving target in cluster-based WSNs suffers a boundary problem. The main goal of this paper was to introduce an efficient and novel mobility management protocol namely Target Tr...

متن کامل

Data provenance tracking as the basis for a biomedical virtual research environment

In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance ...

متن کامل

Migrating Scientific Workflow Management Systems from the Grid to the Cloud

Cloud computing is an emerging computing paradigm that can offer unprecedented scalability and resources on demand, and is gaining significant adoption in the science community. At the same time, scientific workflow management systems provide essential support and functionality to scientific computing, such as management of data and task dependencies, job scheduling and execution, provenance tr...

متن کامل

Provenance Support for Medical Research

This poster paper introduces a system known as CRISTAL [1] and the experience using it for medical research, primarily in the neuGRID [2] and neuGridforUsers (N4U) projects. These projects aim to provide detailed traceability for research analysis processes in the study of biomarkers for Alzheimer’s disease. They have faced major challenges in managing data volumes and algorithm complexity lead...

متن کامل

Steps Toward Managing Lineage Metadata in Grid Clusters

The lineage of a piece of data is of utility to a wide range of domains. Several application-specific extensions have been built to facilitate tracking the origin of the output that the software produces. In the quest to provide such support to extant programs, efforts have been recently made to develop operating system functionality for auditing filesystem activity to infer lineage relationshi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2008